Skip to content

feat: add Euler CFG++ and Euler-A CFG++ samplers#1354

Merged
leejet merged 2 commits into
leejet:masterfrom
daniandtheweb:euler_cfg_pp
May 14, 2026
Merged

feat: add Euler CFG++ and Euler-A CFG++ samplers#1354
leejet merged 2 commits into
leejet:masterfrom
daniandtheweb:euler_cfg_pp

Conversation

@daniandtheweb
Copy link
Copy Markdown
Contributor

@daniandtheweb daniandtheweb commented Mar 17, 2026

This PR adds support for the Euler CFG++ and Euler Ancestral CFG++ samplers: CFG++.

The logic from the code has been adapted from their repository and checked against ComfyUI's implementation and I tried to keep the sampler style as close as possible to the existing ones.
Some changes were needed in src/stable-diffusion.cpp as this specific sampler requires the unconditioned output in order to work.
This currently doesn't work with Spectrum cache.

As any CFG++ sampler you must use very low CFG values (for SDXL often less than 2).

I'd be very grateful if anyone could review this, as it's the first sampler I implement that requires this kind of changes.

@Green-Sky
Copy link
Copy Markdown
Contributor

Green-Sky commented Mar 19, 2026

I wonder whats the best way to integrate this here is. Binding this to samplers feels wrong.


Short test, trying to find the value for cfg++ that matches the image the closest.

euler_a 5cfg euler_a_cfg_pp 1.1cfg
cyber_sd1_eulera_cfg5 output

@daniandtheweb
Copy link
Copy Markdown
Contributor Author

I still think it's a good addition to the samplers. I tend to see more coherent generations at relatively higher CFGs, however I may be biased, this requires more testing.
When I'll have some time I'll try to create a grid showing the differences in CFGs between the sampler variants to have a better idea whether this is worth adding as a sampler or not.

@daniandtheweb
Copy link
Copy Markdown
Contributor Author

daniandtheweb commented Mar 23, 2026

From some initial testing with a SDXL model (cyberrealisticXL_V90, positive prompt: "a cute cat, best quality,", negative prompt: "worst quality,", 20 steps, seed 42, discrete scheduler) the biggest image similarity between Euler and Euler CFG++ is at 4.5 and 1.1 CFG (91.62% SSIM 61.07 MSE).

screenshot-2026-03-23_20-28-23

There's also a high similarity between CFG 1.4 and 5.5 (90%).


Other than that, as the CFG++ paper specifies this method allows for a smoother trajectory of generation:

Positive prompt: "a cute cat, best quality, black background",
Negative prompt: "worst quality, photo,",
Seed: 42, Scheduler: discrete

5 steps Euler CFG++ 5 steps Euler Target Image (20 steps Euler)
1765151440 1765151441 1765151439

Next I'll also try on more complex prompts.

@daniandtheweb
Copy link
Copy Markdown
Contributor Author

I did some more testing, on a more complex generation there appears to be an increase in difference in the generated images.
On the same test model (cyberrealisticXL_V90), 20 steps with positive prompt: "dog jumping towards a door, outdoors, grass, house, best quality, photography" and negative prompt: "worst quality," here's what I get:

CFG 4.0 4.5 5.0 5.5 6.0
Euler 1765151471 1765151472 1765151473 1765151474 1765151475
CFG 1.1 1.2 1.3 1.4
Euler CFG++ 1765151476 1765151477 1765151478 1765151479
CFG 1.5 1.6 1.7 1.8
Euler CFG++ 1765151480 1765151481 1765151482 1765151483

The highest similarity in this case comes from 5.5 and 1.3 and it's still just 78%:
screenshot-2026-03-24_02-55-15

I can't really say if it's better or not, however given the smooth generation trajectory I think that this method may introduce less artifacts in certain kind of images (I just managed to get it to happen on very few samples, but I'm not that good at writing prompts so I can't really test that aspect so well).

@wbruna wbruna mentioned this pull request Mar 24, 2026
@wbruna
Copy link
Copy Markdown
Contributor

wbruna commented Mar 24, 2026

As I've mentioned in #1363, a better strategy could be turning the samplers into implementation objects, so that:

if (method == EULER_CFG_PP_SAMPLE_METHOD || method == EULER_A_CFG_PP_SAMPLE_METHOD) {
    LOG_WARN("Spectrum requested but not supported for CFG++ samplers");
    return;
}

would become something like this:

if (sampler->works_with_unconditioned_output()) {
    LOG_WARN("Spectrum requested but not supported for the %s sampler", sampler->pretty_name().c_str());
    return;
}

so we keep most sampler-specific quirks inside the sampler class.

OTOH, I don't think this implementation should necessarily wait for that refactor; it's not a lot of code, and it's already well tested, so it'd be perfectly fine to adapt it afterwards.

@daniandtheweb
Copy link
Copy Markdown
Contributor Author

daniandtheweb commented Mar 24, 2026

That seems like a good change for maintanability.

I've taken a look at your PR and it sure looks cleaner without the massive switch. It also happens that, while writing these samplers, I've also been thinking a lot about how to use the eta parameter to control the noise levels so it's nice to have it implemented. I don't think it would be a problem to adapt this code to your PR even if yours gets merged before this, as the overall structure is quite clear.

@daniandtheweb daniandtheweb marked this pull request as draft April 9, 2026 16:54
@daniandtheweb
Copy link
Copy Markdown
Contributor Author

I'm converting this to draft until I've reworked the code to be compatible with the latest changes.

@daniandtheweb daniandtheweb marked this pull request as ready for review May 5, 2026 17:39
@daniandtheweb
Copy link
Copy Markdown
Contributor Author

I finally had some time to work on this and rebase it on the latest master. I'm not entirely sure if this can be considered completely good as it is right now but at least it works again.

Comment thread src/denoiser.hpp Outdated
}

if (hist.size() == static_cast<size_t>(max_order - 1)) {
if (hist.size() == static_cast<size_t>(max_order - 1), nullptr) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a comma expression, the if statement will always evaluate to nullptr (= false). Is that okay?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have mistakenly added it during the refactor, thanks for noticing, I've just fixed that.

Copy link
Copy Markdown

@masamaru-san masamaru-san May 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

edit: I'm not a Contributor so that plz ignore my comments.

@leejet leejet merged commit 9d68341 into leejet:master May 14, 2026
13 checks passed
@leejet
Copy link
Copy Markdown
Owner

leejet commented May 14, 2026

Thank you for your contribution.

fszontagh added a commit to fszontagh/stable-diffusion.cpp that referenced this pull request May 15, 2026
Picks up 8 commits since the previous sync at 90e87bc:

  0b82969 docs: add .github/pull_request_template.md
  381e0df docs: add CONTRIBUTING.md
  0665a7f feat: add hidream o1 image support (leejet#1485)
  eeac950 fix: Use PkgConfig for WebP and WebM (leejet#1400)
  57ff2eb feat: support for memory-mapping model weights (leejet#1414)
  9d68341 feat: add Euler CFG++ and Euler-A CFG++ samplers (leejet#1354)
  60477fd docs: add new go bindings for stable-diffusion.cpp (leejet#1480)
  6ee0684 feat: display server url with "http://" prefix. (leejet#1486)

Conflicts, all in src/ggml_extend.hpp:

1. copy_data_to_backend_tensor signature: upstream made gf required
   (graph-cut needs the segment's graph to restrict uploads); our
   layer-streaming path needs gf=nullptr so each mini-graph uploads
   its full backend_tensor_data_map without filtering. Resolution:
   keep gf optional (default nullptr) and guard the graph_tensor_set
   filter on gf != nullptr. Upstream's new read_graph_tensor<T>
   template is added unchanged above copy_data_to_backend_tensor.

2. Tensor-loop null check: upstream added tensor/data null guards and
   a single ggml_get_name() lookup. Kept both, with our gf-gate
   layered on top of upstream's set-membership check.

3. alloc_params_buffer: upstream's mmap fast-path (skip allocation
   when every tensor already has data, since ggml_backend_alloc_ctx_tensors
   would hit n_buffers==0) and our pinned-host fast-path (allocate
   weights in the GPU device's host buffer for async H2D under
   offload) collide on the same function. Resolution: mmap check
   runs first and returns early — mmapped tensors can't be moved
   into pinned host memory — then the pinned-host path runs for the
   non-mmap CPU-params-with-GPU-runtime case, then the original
   pageable params_backend alloc as the final fallback.

Smoke-tested on Z-Image-Turbo Q8 at 512x512:
  --offload-mode layer_streaming  -> 4.0s total (coarse-stage path)
  --offload-to-cpu --max-vram 4   -> 8.3s total (3 graph-cut segments)

HiDream O1 streaming hooks deferred to a follow-up commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants